97 research outputs found
A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games
This paper proposes novel, end-to-end deep reinforcement learning algorithms
for learning two-player zero-sum Markov games. Our objective is to find the
Nash Equilibrium policies, which are free from exploitation by adversarial
opponents. Distinct from prior efforts on finding Nash equilibria in
extensive-form games such as Poker, which feature tree-structured transition
dynamics and discrete state space, this paper focuses on Markov games with
general transition dynamics and continuous state space. We propose (1) Nash DQN
algorithm, which integrates DQN with a Nash finding subroutine for the joint
value functions; and (2) Nash DQN Exploiter algorithm, which additionally
adopts an exploiter for guiding agent's exploration. Our algorithms are the
practical variants of theoretical algorithms which are guaranteed to converge
to Nash equilibria in the basic tabular setting. Experimental evaluation on
both tabular examples and two-player Atari games demonstrates the robustness of
the proposed algorithms against adversarial opponents, as well as their
advantageous performance over existing methods
Research on the Evaluation of Green Logistics Based on Cloud Model
Businesses According to the theory of sustainable development, combining with the current development status of the social logistics industry and the characteristics of green logistics, constructing a green logistics evaluation index system. Using cloud model and Delphi method to calculate the cloud weight of green logistics evaluation index, qualitative and quantitative conversion of evaluation index is realized by cloud generator. Take Jiangsu Province as an example to do empirical research, using the cloud model and its algorithm to get the evaluation cloud of green logistics, observing the evaluation result directly and discovering problem easy by comparing the evaluation cloud chart with ruler cloud chart. The evaluation results show that the cloud model is more reasonable, and the credibility of the evaluation results is improved
Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence
Learning agents that are not only capable of taking tests, but also
innovating is becoming a hot topic in AI. One of the most promising paths
towards this vision is multi-agent learning, where agents act as the
environment for each other, and improving each agent means proposing new
problems for others. However, existing evaluation platforms are either not
compatible with multi-agent settings, or limited to a specific game. That is,
there is not yet a general evaluation platform for research on multi-agent
intelligence. To this end, we introduce Arena, a general evaluation platform
for multi-agent intelligence with 35 games of diverse logics and
representations. Furthermore, multi-agent intelligence is still at the stage
where many problems remain unexplored. Therefore, we provide a building toolkit
for researchers to easily invent and build novel multi-agent problems from the
provided game set based on a GUI-configurable social tree and five basic
multi-agent reward schemes. Finally, we provide Python implementations of five
state-of-the-art deep multi-agent reinforcement learning baselines. Along with
the baseline implementations, we release a set of 100 best agents/teams that we
can train with different training schemes for each game, as the base for
evaluating agents with population performance. As such, the research community
can perform comparisons under a stable and uniform standard. All the
implementations and accompanied tutorials have been open-sourced for the
community at https://sites.google.com/view/arena-unity/
Implicit Neural Representation for Cooperative Low-light Image Enhancement
The following three factors restrict the application of existing low-light
image enhancement methods: unpredictable brightness degradation and noise,
inherent gap between metric-favorable and visual-friendly versions, and the
limited paired training data. To address these limitations, we propose an
implicit Neural Representation method for Cooperative low-light image
enhancement, dubbed NeRCo. It robustly recovers perceptual-friendly results in
an unsupervised manner. Concretely, NeRCo unifies the diverse degradation
factors of real-world scenes with a controllable fitting function, leading to
better robustness. In addition, for the output results, we introduce
semantic-orientated supervision with priors from the pre-trained
vision-language model. Instead of merely following reference images, it
encourages results to meet subjective expectations, finding more
visual-friendly solutions. Further, to ease the reliance on paired data and
reduce solution space, we develop a dual-closed-loop constrained enhancement
module. It is trained cooperatively with other affiliated modules in a
self-supervised manner. Finally, extensive experiments demonstrate the
robustness and superior effectiveness of our proposed NeRCo. Our code is
available at https://github.com/Ysz2022/NeRCo
CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario
Traffic signal control is an emerging application scenario for reinforcement
learning. Besides being as an important problem that affects people's daily
life in commuting, traffic signal control poses its unique challenges for
reinforcement learning in terms of adapting to dynamic traffic environment and
coordinating thousands of agents including vehicles and pedestrians. A key
factor in the success of modern reinforcement learning relies on a good
simulator to generate a large number of data samples for learning. The most
commonly used open-source traffic simulator SUMO is, however, not scalable to
large road network and large traffic flow, which hinders the study of
reinforcement learning on traffic scenarios. This motivates us to create a new
traffic simulator CityFlow with fundamentally optimized data structures and
efficient algorithms. CityFlow can support flexible definitions for road
network and traffic flow based on synthetic and real-world data. It also
provides user-friendly interface for reinforcement learning. Most importantly,
CityFlow is more than twenty times faster than SUMO and is capable of
supporting city-wide traffic simulation with an interactive render for
monitoring. Besides traffic signal control, CityFlow could serve as the base
for other transportation studies and can create new possibilities to test
machine learning methods in the intelligent transportation domain.Comment: WWW 2019 Demo Pape
- …